Genome Medicine
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
Severe combined immunodeficiency (SCID) is a heterogeneous, recessive disorder, associated with the onset of severe, recurrent infections in the first few months of life. SCID is fatal if left untreated, but outcomes can be significantly improved by prompt diagnosis and treatment, particularly prior to onset of infection. Consequently, SCID is already included in many newborn screening programmes around the world, as well as multiple international genomic newborn screening (gNBS) research progra...
Show abstract
BackgroundMolecular diagnosis of rare disease plateaus at [~]50%, partly due to technical limitations of short-read sequencing and the persistent challenge of interpreting variants of uncertain significance (VUS). Splice-altering variation represents a major source of unresolved cases, yet functional assessment remains difficult in routine practice. MethodsWe developed a fully modular, sample-to-answer workflow for targeted long-read RNA sequencing (lrRNA-seq) using Oxford Nanopore Technologies...
Show abstract
Genetic diagnosis remains a formidable challenge characterized by a diagnostic odyssey that spans years, with over half of rare disease patients remaining undiagnosed affecting more than 300 million people on earth. Clinicians must navigate through thousands of candidate variants against a noisy and fragmented literature landscape, a task that overwhelms human cognitive capacity and conventional decision-making approaches. Recent advances in agentic artificial intelligence systems have demonstra...
Show abstract
MotivationFanconi anemia (FA) is a rare disease mainly caused by biallelic pathogenic variants, including structural variants such as large deletions and insertions in FA genes. Currently, variant detection is based on short-read sequencing and probe-based approaches. However, determining the exact genomic breakpoint or achieving allelic discrimination remains challenging. Nanopore-based long-read sequencing enables a comprehensive detection of FA variants, but a unified bioinformatic analysis p...
Show abstract
BackgroundMost rare coding variants in monogenic disease genes remain classified as Variants of Uncertain Significance (VUS), limiting their use in clinical care. Many variant classifications have been submitted to ClinVar, often with rich free-text summaries of the evidence underlying each classification. These narratives are not standardized and are difficult to mine systematically, making it challenging to identify variants that might be reclassified as new evidence becomes available. Method...
Show abstract
BackgroundVariation in the HLA loci, located on human chromosome 6p, has been associated with hundreds of diseases and conditions. However, high levels of polymorphism that characterize the HLA system, coupled with generally modest effect sizes for most phenotypes, necessitate relatively large sample sizes to power association studies; meanwhile, high resolution HLA genotyping remains relatively resource intensive. These constraints limit identification of novel associations. While phenome-wide ...
Show abstract
RNA sequencing (RNA-seq) provides a powerful complement to DNA sequencing for uncovering pathogenic defects affecting gene expression and splicing in individuals with genetically undiagnosed rare disorders. However, as large rare disease consortia adopt RNA-seq, challenges arise due to cohort heterogeneity, variability in tissues and sample sizes, and differences in interpretation practices. Here, we present a harmonized analytical and interpretation framework developed by the pan-European Solv...
Show abstract
The Clinical Pharmacogenetics Implementation Consortium (CPIC) bases its drug-gene recommendations on the assignment of star alleles, which map known genotypes to defined functional categories and corresponding drug dosage guidelines. The star allele framework, first proposed in 1996 for the CYP gene family and later formalized with CPICs establishment in 2010 [1, 2], remains foundational to pharmacogenomics. However, this system has notable limitations. Its dependence on a restricted set of ben...
Show abstract
BackgroundKlebsiella pneumoniae is a common cause of neonatal sepsis in Africa, and is frequently hospital acquired. We recently reported an outbreak of multidrug-resistant K. pneumoniae sepsis amongst neonates at a rural hospital in The Gambia, West Africa, involving 57 cases and case fatality of 60%. Here we undertook a retrospective pathogen genomic epidemiology study of clinical and environmental K. pneumoniae isolated during the outbreak, to identify the outbreak strain, refine the epidemic...
Show abstract
BackgroundPediatric neuromuscular diseases are genetically and clinically heterogeneous. A substantial proportion remain without a definitive genetic diagnosis despite available clinical molecular testing. RNA-sequencing (RNA-seq) can be used to complement genome or exome sequencing to elucidate or to identify the functional impact of variants of uncertain significance, but when manually analyzed is limited to candidate DNA variants or phenotype-driven gene lists. Open-source computational tools...
Show abstract
BACKGROUNDGenetic variant curation, an important step in the implementation of Genomic Medicine, requires literature-guided comparison of variant prevalence in affected individuals versus healthy controls. This evidence is categorized as the PS4 evidence code by the AMP/ACMG variant interpretation guidelines and its manual extraction is a major bottleneck in clinical variant curation. This study aimed to evaluate whether reasoning-capable large language models (LLMs) can support guideline-constr...
Show abstract
BackgroundMitochondrial diseases are the most common inherited metabolic disorders, characterized by pronounced clinical and genetic heterogeneity that complicates molecular diagnosis. Although DNA-based sequencing approaches have become standard in genetic testing, up to half of patients remain without a definitive diagnosis. RNA sequencing (RNA-seq) provides a complementary layer of evidence by revealing functional consequences of genetic variation, thereby improving diagnostic yield. Methods...
Show abstract
Exome sequencing (ES) is a cornerstone of clinical genetic diagnosis, yet its application in pharmacogenomics remains limited. While some pharmacogenetic variants are detectable by ES, clinically relevant loci such as CYP2D6, UGT1A1, and HLA remain challenging. We present a robust, comprehensive method to derive a complete pharmacogenomic profile directly from standard ES data. Our method addresses primary limitations of ES for pharmacogenomics, including low coverage and structural complexity a...
Show abstract
Systematic analysis of copy number variants (CNVs) in large datasets is challenging and there are limited studies of homozygous copy number losses in rare disease exome datasets. Here we leveraged the genomic uniqueness and relative under-representation of the Indian population in the current public genomic databases and identified 42,386 possible homozygous losses (median count 20 per individual, range 0 - 55; median size 2.95 kb, range 99 bp - 4.76 Mb) in a heterogeneous cohort of 2,021 indivi...
Show abstract
Klebsiella pneumoniae is a major causative agent of hospital-acquired infections worldwide, contributing substantially to morbidity, mortality, and healthcare burden.. The emergence of strains that combine resistance to last-resort antimicrobials with hypervirulence has become a pressing public-health challenge. Despite extensive characterization of the genetic determinants of multidrug resistance and hypervirulence, the relationship between the genetic repertoire of K. pneumoniae and the clinic...
Show abstract
Structural variants (SVs) are a major source of genomic diversity and disease susceptibility; however, populations from the Middle East and North Africa (MENA) region remain critically underrepresented in global reference databases. We provide the first detailed catalogue of structural variation in 61 individuals from diverse MENA countries, using publicly available ultra-long Oxford Nanopore sequencing. A scalable and dual-reference alignment-based method (GRCh38 and T2T-CHM13) was employed to ...
Show abstract
BackgroundDespite their profound impact on patients lives, most rare and intractable diseases still lack established treatments. Genomic variants that disrupt normal splicing by creating novel splice sites (splice-site creating variants, SSCVs) substantially contribute to the pathogenesis of those conditions. Deep intronic SSCVs are particularly amenable to antisense oligonucleotide (ASO)-mediated splice modulation, yet many of them remain undetected by conventional genomic analyses. Existing ap...
Show abstract
We present a streamlined, solid-phase workflow for Oxford Nanopore sequencing that integrates DNA extraction, purification, and library preparation within a single microfluidic cartridge. By eliminating tube transfers and performing all enzymatic steps directly on captured DNA, the method minimizes sample loss, reduces hands-on time, and simplifies library generation for long-read sequencing. Starting from volumes as small as a single drop of blood, this integrated approach produces high-quality...
Show abstract
Detecting low variant allele fraction (VAF) mosaic variants without matching controls remains a major challenge in genomics, limited by technical noise, lack of benchmarks, and computational scalability. We present the DRAGEN mosaic caller, a hardware-accelerated approach identifying variants down to [~]1-2% VAF with low false-positive rates and hour-scale runtimes for mosaic SNV/indel detection from bulk sequencing. To support evaluation, we introduce a genome-wide low-VAF benchmark for variant...
Show abstract
PTEN hamartoma tumor syndrome (PHTS) is a cancer predisposition disorder caused by germline PTEN variants, yet its full clinical spectrum remains poorly defined due to reliance on highly selected cohorts. Accordingly, PHTS is underrecognized and its prevalence underestimated. Leveraging genomic and electronic health record data from 414,830 participants in the All of Us (AoU) Research Program, we identified 55 individuals with pathogenic or likely pathogenic PTEN variants, the majority of whom l...